The Statistics of Sequence Similarity Scores
ثبت نشده
چکیده
To assess whether a given alignment constitutes evidence for homology, it helps to know how strong an alignment can be expected from chance alone. In this context, "chance" can mean the comparison of (i) real but non-homologous sequences; (ii) real sequences that are shuffled to preserve compositional properties [1-3]; or (iii) sequences that are generated randomly based upon a DNA or protein sequence model. Analytic statistical results invariably use the last of these definitions of chance, while empirical results based on simulation and curvefitting may use any of the definitions.
منابع مشابه
A unified statistical framework for sequence comparison and structure comparison.
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using thi...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملEmpirical statistical estimates for sequence similarity searches.
The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the extreme value distribution from the mean and variance of the local similarity scores of unrelated sequences after the scores have been corrected for the expected effect of library sequence length. This...
متن کاملStatistics of local multiple alignments
SUMMARY BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated...
متن کاملSequence analysis of ORF94 in different White Spot Syndrome Virus (WSSV) isolates of Iran
White spot syndrome virus (WSSV) is a pathogen that causes high mortality in shrimp culture in the whole world. Sequence analysis of WSSV has shown similarity of WSSV isolates in different countries with exception of a few variable genomic loci. This study investigated the sequence variation of some Iranian WSSV isolates and previously identified isolates. Samples were collected during target ...
متن کاملMolecular characterization of apolipoprotein A-I from the skin mucosa of Cyprinus carpio
Apolipoprotein A-I is the most abundant protein in Cyprinus carpio plasma that plays an important role in lipid transport and protection of the skin by means of its antimicrobial activity. A 527 bp cDNA fragment encoding C terminus part of apoA-I from the skin mucosa of common carp was isolated using RT-PCR. After GenBank database searching, a partial sequence containing a coding sequence (CDS)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002